{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Collecting weather data from an API\n",
    "\n",
    "## About the data\n",
    "In this notebook, we will be collecting daily weather data from the [National Centers for Environmental Information (NCEI) API](https://www.ncdc.noaa.gov/cdo-web/webservices/v2). We will use the Global Historical Climatology Network - Daily (GHCND) data set; see the documentation [here](https://www1.ncdc.noaa.gov/pub/data/cdo/documentation/GHCND_documentation.pdf).\n",
    "\n",
    "*Note: The NCEI is part of the National Oceanic and Atmospheric Administration (NOAA) and, as you can see from the URL for the API, this resource was created when the NCEI was called the NCDC. Should the URL for this resource change in the future, you can search for the NCEI weather API to find the updated one.*\n",
    "\n",
    "## Using the NCEI API\n",
    "Paste your token below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "def make_request(endpoint, payload=None):\n",
    "    \"\"\"\n",
    "    Make a request to a specific endpoint on the weather API\n",
    "    passing headers and optional payload.\n",
    "    \n",
    "    Parameters:\n",
    "        - endpoint: The endpoint of the API you want to \n",
    "                    make a GET request to.\n",
    "        - payload: A dictionary of data to pass along \n",
    "                   with the request.\n",
    "    \n",
    "    Returns:\n",
    "        Response object.\n",
    "    \"\"\"\n",
    "    return requests.get(\n",
    "        f'https://www.ncdc.noaa.gov/cdo-web/api/v2/{endpoint}',\n",
    "        headers={\n",
    "            'token': 'PASTE_YOUR_TOKEN_HERE'\n",
    "        },\n",
    "        params=payload\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Collect All Data Points for 2018 In NYC (Various Stations)\n",
    "We can make a loop to query for all the data points one day at a time. Here we create a list of all the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Gathering data for 2018-12-31'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import datetime\n",
    "\n",
    "from IPython import display # for updating the cell dynamically\n",
    "\n",
    "current = datetime.date(2018, 1, 1)\n",
    "end = datetime.date(2019, 1, 1)\n",
    "\n",
    "results = []\n",
    "\n",
    "while current < end:\n",
    "    # update the cell with status information\n",
    "    display.clear_output(wait=True)\n",
    "    display.display(f'Gathering data for {str(current)}')\n",
    "    \n",
    "    response = make_request(\n",
    "        'data', \n",
    "        {\n",
    "            'datasetid' : 'GHCND', # Global Historical Climatology Network - Daily (GHCND) dataset\n",
    "            'locationid' : 'CITY:US360019', # NYC\n",
    "            'startdate' : current,\n",
    "            'enddate' : current,\n",
    "            'units' : 'metric',\n",
    "            'limit' : 1000 # max allowed\n",
    "        }\n",
    "    )\n",
    "\n",
    "    if response.ok:\n",
    "        # we extend the list instead of appending to avoid getting a nested list\n",
    "        results.extend(response.json()['results'])\n",
    "\n",
    "    # update the current date to avoid an infinite loop\n",
    "    current += datetime.timedelta(days=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can create a dataframe with all this data. Notice there are multiple stations with values for each datatype on a given day. We don't know what the stations are, but we can look them up and add them to the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>attributes</th>\n",
       "      <th>datatype</th>\n",
       "      <th>date</th>\n",
       "      <th>station</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>,,N,</td>\n",
       "      <td>PRCP</td>\n",
       "      <td>2018-01-01T00:00:00</td>\n",
       "      <td>GHCND:US1CTFR0039</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>,,N,</td>\n",
       "      <td>PRCP</td>\n",
       "      <td>2018-01-01T00:00:00</td>\n",
       "      <td>GHCND:US1NJBG0015</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>,,N,</td>\n",
       "      <td>SNOW</td>\n",
       "      <td>2018-01-01T00:00:00</td>\n",
       "      <td>GHCND:US1NJBG0015</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>,,N,</td>\n",
       "      <td>PRCP</td>\n",
       "      <td>2018-01-01T00:00:00</td>\n",
       "      <td>GHCND:US1NJBG0017</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>,,N,</td>\n",
       "      <td>SNOW</td>\n",
       "      <td>2018-01-01T00:00:00</td>\n",
       "      <td>GHCND:US1NJBG0017</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  attributes datatype                 date            station  value\n",
       "0       ,,N,     PRCP  2018-01-01T00:00:00  GHCND:US1CTFR0039    0.0\n",
       "1       ,,N,     PRCP  2018-01-01T00:00:00  GHCND:US1NJBG0015    0.0\n",
       "2       ,,N,     SNOW  2018-01-01T00:00:00  GHCND:US1NJBG0015    0.0\n",
       "3       ,,N,     PRCP  2018-01-01T00:00:00  GHCND:US1NJBG0017    0.0\n",
       "4       ,,N,     SNOW  2018-01-01T00:00:00  GHCND:US1NJBG0017    0.0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(results)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Save this data to a file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('data/nyc_weather_2018.csv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and write it to the database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sqlite3\n",
    "\n",
    "with sqlite3.connect('data/weather.db') as connection:\n",
    "    df.to_sql(\n",
    "        'weather', connection, index=False, if_exists='replace'\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For learning about merging dataframes, we will also get the data mapping station IDs to information about the station:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = make_request(\n",
    "    'stations', \n",
    "    {\n",
    "        'datasetid' : 'GHCND', # Global Historical Climatology Network - Daily (GHCND) dataset\n",
    "        'locationid' : 'CITY:US360019', # NYC\n",
    "        'limit' : 1000 # max allowed\n",
    "    }\n",
    ")\n",
    "\n",
    "stations = pd.DataFrame(response.json()['results'])[['id', 'name', 'latitude', 'longitude', 'elevation']]\n",
    "stations.to_csv('data/weather_stations.csv', index=False)\n",
    "\n",
    "with sqlite3.connect('data/weather.db') as connection:\n",
    "    stations.to_sql(\n",
    "        'stations', connection, index=False, if_exists='replace'\n",
    "    )"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
